Plan-based reward shaping for multi-agent reinforcement learning

نویسندگان

Sam Devlin

Daniel Kudenko

چکیده

Recent theoretical results have justified the use of potential-based reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function. Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learning. Following up on this work, we investigate the use of STRIPS planning knowledge in the context of MARL. Our results show that a potential function based on joint or individual plan knowledge can significantly improve MARL performance compared with no shaping. In addition, we investigate the limitations of individual plan knowledge as a source of reward shaping in cases where the combination of individual agent plans causes conflict.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Abstract MDP Reward Shaping for Multi-Agent Reinforcement Learning

MDP Reward Shaping for Multi-Agent Reinforcement Learning Kyriakos Efthymiadis, Sam Devlin and Daniel Kudenko Department of Computer Science, The University of York, UK Abstract. Reward shaping has been shown to significantly improve an agent’s performance in reinforcement learning. As attention is shifting from tabula-rasa approaches to methods where some heuristic domain knowledge can be give...

متن کامل

Overcoming incorrect knowledge in plan-based reward shaping

Reward shaping has been shown to significantly improve an agent’s performance in reinforcement learning. Plan-based reward shaping is a successful approach in which a STRIPS plan is used in order to guide the agent to the optimal behaviour. However, if the provided knowledge is wrong, it has been shown the agent will take longer to learn the optimal policy. Previously, in some cases, it was bet...

متن کامل

Overcoming erroneous domain knowledge in plan-based reward shaping

Reward shaping has been shown to significantly improve an agent’s performance in reinforcement learning. Plan-based reward shaping is a successful approach in which a STRIPS plan is used in order to guide the agent to the optimal behaviour. However, if the provided domain knowledge is wrong, it has been shown the agent will take longer to learn the optimal policy. Previously, in some cases, it ...

متن کامل

Multi-agent, reward shaping for RoboCup KeepAway

This paper investigates the impact of reward shaping in multi-agent reinforcement learning as a way to incorporate domain knowledge about good strategies. In theory [2], potential-based reward shaping does not alter the Nash Equilibria of a stochastic game, only the exploration of the shaped agent. We demonstrate empirically the performance of statebased and state-action-based reward shaping in...

متن کامل

Theoretical considerations of potential-based reward shaping for multi-agent systems

Potential-based reward shaping has previously been proven to both be equivalent to Q-table initialisation and guarantee policy invariance in single-agent reinforcement learning. The method has since been used in multi-agent reinforcement learning without consideration of whether the theoretical equivalence and guarantees hold. This paper extends the existing proofs to similar results in multi-a...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Knowledge Eng. Review

دوره 31 شماره

صفحات -

تاریخ انتشار 2016

Plan-based reward shaping for multi-agent reinforcement learning

نویسندگان

چکیده

منابع مشابه

Abstract MDP Reward Shaping for Multi-Agent Reinforcement Learning

Overcoming incorrect knowledge in plan-based reward shaping

Overcoming erroneous domain knowledge in plan-based reward shaping

Multi-agent, reward shaping for RoboCup KeepAway

Theoretical considerations of potential-based reward shaping for multi-agent systems

عنوان ژورنال:

اشتراک گذاری